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This paper develops a two gene, single fitness peak model for determining the equilibrium dis- 
tribution of genotypes in a unicellular population which is capable of genetic damage repair. The 
first gene, denoted by a v ia, yields a viable organism with first order growth rate constant k > 1 if 
it is equal to some target "master" sequence a V i a ,o- The second gene, denoted by oyep, yields an 
G ' organism capable of genetic repair if it is equal to some target "master" sequence a rep ,o- This model 

is analytically solvable in the limit of infinite sequence length, and gives an equilibrium distribution 
which depends on /i = Le, the product of sequence length and per base pair replication error prob- 
' ability, and e T , the probability of repair failure per base pair. The equilibrium distribution is shown 

to exist in one of three possible "phases." In the first phase, the population is localized about the 
viability and repairing master sequences. As e r exceeds the fraction of deleterious mutations, the 
population undergoes a "repair" catastrophe, in which the equilibrium distribution is still localized 
about the viability master sequence, but is spread ergodically over the sequence subspace denned 

S by the repair gene. Below the repair catastrophe, the distribution undergoes the error catastrophe 
when fj, exceeds lnfc/e r , while above the repair catastrophe, the distribution undergoes the error 
catastrophe when fi exceeds Ink/fdei, where fdei denotes the fraction of deleterious mutations. 

a' 

5j ' PACS numbers: 87.23.Kg, 87.16.Ac, 64.90.+b 
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I. INTRODUCTION 



To cope with genetic damage to their genomes, cellular organisms have developed a host of mechanisms to repair, 
and, if necessary, replace, damaged DNA. Environmental damage due to mutagens, metabolic free radicals, and 
radiation is repaired by enzymes which continuously scan the DNA molecule and repair the damaged portions. 
Replication errors are also repaired by several methods. In Escherichia coli, the DNA replicase Pol III has a built-in 
proofreading mechanism which reduces the replication error probability to 1CP 7 — 1CP 6 per base pair. Furthermore, 
immediately following replication, a second proofreading mechanism, known as mismatch repair, identifies and corrects 
| mismatched base pairs. In E. coli, the mismatch repair system reduces the error probability to 10~ 10 — 10~ 8 per base 
Q . pair Q- 

The DNA mismatch repair system is of considerable interest because it is believed that mismatch repair deficient 
strains, or mutators, play an important role in the emergence of antibiotic drug resistance and cancer 0, 0, IH El • 
Because mutators have mutation rates which are 10 to 10, 000 times higher than wild-type strains, they can more 
rapidly adapt to hostile environments, thereby explaining their potential importance in understanding drug resistance. 
However, mutators can accumulate genetic damage much more rapidly than nonmutators, and hence can serve as an 
intermediate for the appearance of cancerous cells in multicellular organisms. 

In an earlier work, we developed a simple, analytically solvable model to determine the equilibrium population of 
mutators in an asexual, unicellular population of replicating organisms [8| . The main result of this model was that 
at equilibrium, the population can exist in one of two "phases." For sufficiently efficient repair, the population was 
shown to exist in a "repairer" phase, in which the fraction of repairers is a finite, positive quantity which depends 
only on the efficiency of repair and the fraction of the genome coding for repair. The equilibrium genotype of the 
population is localized about the "master" subsequence for which repair is functionining. When the repair efficiency 
drops below a critical value, the population dclocalizcs over the repair sequence subspace, and the fraction of repairers 
becomes zero in the limit of infinite genome length. This phase is naturally termed the "mutator" phase. In [8j the 
transition from the repairer to the mutator phases was called the repair catastrophe. 

The solution of the model presented in [8| is incomplete, in that it describes the equilibrium behavior of the system 
in the low mutation rate regime. This allowed one to assume that only point mutations were important, considerably 
simplifying the calculations. However, another phase transition has also been shown to occur when the mutation rate 
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becomes too large. Above a critical mutation rate, replicative selection can no longer recover the loss of information 
due to genetic damage. This phenomenon is known as the error catastrophe, and was first predicted to occur by Eigen 
0,0]. It has since been studied in a number of theoretical papers 0,0,0,0,0,0] (and references therein), 
and also observed experimentally 0, 0] . 

Because the model presented in our paper was only solved in the point-mutation regime, it did not incorporate 
the effect of the error catastrophe. The assumption underlying our initial approach was that mutators, despite their 
higher than wild-type mutation rates, are still viable organisms, and so live well below the error catastrophe. 

The method used in our paper has since been generalized, so that it is no longer necessary to assume only point 
mutations. Therefore it is possible to obtain the equilibrium behavior for arbitrary mutation rates. Thus, the interplay 
between the error and repair catastrophes can be studied. We believe that our approach is quite powerful, and may 
be applied toward solving a large class of mutation dynamics equations. 

This paper is organized as follows: In the following section, we present a brief review of the quasispecies equations 
developed by Eigen, which are often the starting point for studies in evolutionary dynamics. We continue in Section 
III by developing the form of the quasispecies equations for our mutator model. We solve this model in Section 
IV. Specifically, we solve for the equilibrium fraction of viable organisms and viable repairers, which will allow the 
construction of a two dimensional phase diagram incorporating both the error and repair catastrophes. We develop 
a recursive formula for computing the equilibrium fraction of organisms with a given genome, and we also study the 
localization of the distribution. We also look at limiting forms of the distribution, and compare the results in certain 
cases with the corresponding results obtained in [jj. Finally, in Section V we present our conclusions, and plans for 
future research. 



II. THE QUASISPECIES EQUATIONS 

The quasispecies equations are possibly the simplest for modelling the evolutionary dynamics of a unicellular, 
asexual population of replicating organisms. We let n a denote the number of organisms with genome a, and K a 
denote the first-order growth rate constant of an organism with genome a. If K m (er, a') is taken to be the first-order 
mutation rate constant from a to a' , then the time evolution of n a is given by, 

= K a n a + ^ [«m(o-',o-)n CT ' - K m (cr, cr')n a ] (1) 

The mapping K : {a} — > {n a } defines what is called the fitness landscape. In general, the fitness landscape will be 
time dependent, since organisms usually live in dynamic environments. However, since in this paper we wish to study 
equilibrium behaviors, we take the fitness landscape to be static. 

The conversion to Eigen's quasispecies equations is accomplished by converting from absolute populations to pop- 
ulation fractions. Thus, we define n = n a , and x a — n a jn. When reexpressed in terms of the the dynamical 
equations become, 

= i K <? - K(t))x a + ^ [ K m(c', a)x a i - K m {a, <7 r )xo] (2) 

a' 

where R(t) = K a x a . Note then that R(t) is simply the mean fitness of the population, and arises as a normalization 
term which ensures that ^ CT x a = 1 at all times. 

If we define n m (cr, a) = n a — J2a'^a K m(o', a 1 ), then the quasispecies equations become, 

^TT = ^2 K rn(v',v)Xa> ~ ^{t)x a (3) 
a' 

We may simplify the notation further by defining x ~ (x a ) to be the vector of population fractions, and A = 
(A acr i = K m {cr' , a)) to be the matrix of mutation rate constants. We may also define k to be the vector of growth rate 
constants, so that R(t) = k ■ x. Then we obtain, 

— = Ax — (k ■ x)x (4) 

Eigen showed that the system evolves to an equilibrium distribution given by the eigenvector corresponding to the 
largest eigenvalue of A |^, ll(| . If the equilibrium distribution is denoted by x equ u , and the largest eigenvalue is 
denoted by A, then it is possible to show that A = K ■ x equ a . 
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To obtain an expression for K m (a' , a), let us assume that mutations occur due to replication errors. We take a per 
base pair replication error probability of e a i for a'. Let I = HD(a,a') denote the Hamming distance between a and 
a 1 . Then incorrect replication must occur at exactly I sites along the a' sequence, and correct replication must occur 
at the remaining L — l sites, where L denotes the gene sequence length. For an alphabet size of S, incorrect replication 
can be made to one of S — 1 remaining bases. Thus, the per base pair probability of replication to the corresponding 
base pair in a is e a '/(S — 1), giving a replication probability from a' to a of (^Tj-)'(l — £<r') L ~ l ■ Therefore, taking 
into account the overall replication rate, we obtain, 

Km (a',a) = K „,(-^) l (l-e (T ,) L - 1 (5) 

III. A TWO-GENE MODEL INCORPORATING ERROR REPAIR 
A. Definitions and Basic Equations 

A simple model to study quasispecies dynamics with genetic repair is a two gene, single fitness peak (SFP) model. 
We take our genome to have an alphabet size S, composed of "bases" 0, 1, . . . , S — 1. The first gene, denoted by a V i ai 
has length L viai and controls the viability of the organism. We assume that there is a unique, "fit" sequence a V i a fi 
such that K a = k > 1 if a V i a = u V i a fl- Otherwise, K a = 1. There is no loss of generality in assuming k„ = 1 for the 
unfit sequences, since time may always be rescaled so that the unfit K a become 1. 

The second gene, denoted by a rep , has length L rep , and is responsible for the enzymatic machinery involved in 
repair. As with viability, there is a unique sequence, CT rePj o, for which repair is functioning, and has a per base pair 
failure probability of e r . For all other a rep repair is inactivated, and the organism is a mutator. 

For the mutators, the per base pair replication error probability is taken to be e. Thus, for oy ep .o, the per base 
pair replication error probability is e r e. If e a denotes the per base pair replication error probability of genome a, then 
e a = e r e if a rep = <J r ep,o, and e otherwise. 

This model is clearly an oversimplification of the actual genome and replication dynamics of an organism. Nev- 
ertheless, a two gene, SFP model is probably the simplest for studying evolutionary dynamics with genetic damage 
repair, and it is therefore a natural starting point before considering more complicated systems. Despite its simplicity, 
this model still yields sufficiently rich behavior to be of interest. 

To determine the equilibrium distribution of genotypes in this model, note that, by symmetry, we may assume that 
x a depends only on the Hamming distance I = H D{a V i a , a V i a ,o) an d I' = HD(a rep ,a rep fl). We define the Hamming 
class HC(l,l') = {a = a V i a (T rep \H D(cr V ia, <Jvia,o) = I, HD(<j repi (Jrepfl) = I'}- Since x a is assumed to depend only on 
the Hamming class of a, we may define xw = x a for a G HC(l,l'). We may also note that n a = k if I = and 1 
otherwise, so that n a depends only on I. Therefore, we redenote k„ by ni. Similarly, we redenote e CT by ty . 

We wish to express the quasispecies equations in terms of the xu> . To do this, we need to sum the mutational 
contributions of all a to the time evolution of xu> . Let <ru> G HC(l, I'). Any a may be obtained from aw by changing 
the appropriate bases. Let us write aw — a v i a ja rePt ii , and a — a V i a a rep . By definition of the Hamming class, a V i a j 
differs from a V i a _Q in exactly I places. Therefore, a V i a ,i is identical to a V i a .o in L V i a — I places. Of these L„j — I 
bases, let li denote the number of bases which are changed in a. Of the I bases in a via j which are distinct from 
the corresponding bases in a V i a ,o, let I2 denote the number which are changed back to the corresponding bases in 
&via,o when creating a, and let I3 denote the number which are changed to bases which are still distinct from the 
corresponding ones in a via $. The base changes determined by l\, I2, and I3 yields a a via which is a Hamming distance 
of l\ + I — l 2 from ayiafl, and a Hamming distance of l\+h + h from a via ^. 

For the repair gene, we may define l[, 1' 2 , and 1' 3 similarly. Thus, given some aw G HC(l,l'), the vector 
(£1,^2,^3,^1,^2^3) defines a set of base changes to a cr; 1+ ;_; 2 .;' i+ ;/_;^ G HC(h + I — I2J1 + I' — l 2 ), such that 
HD{aw,^h+i-i 2 ,i' 1 +v-i' 2 ) =h+h + h + l[ +l' 2 + l 3- We then obtain that, 

K m (v h+ i-i 2 M 1+ i'-i>,mi>) = m 1+ i- l / J ^^ (6) 

The total mutational flow rate into a given aw may be obtained by summing over the mutational flow rates from 
all possible l 2 , h, l[, l 2 , ^3)- To put together a final expression, we still need to account for degeneracy, since, in 
general, for a given aw and vector l 2 , 13, l[, 1' 2 , 13), there are multiple ways for generating a new gene sequence. For 
a given li, we need to choose l\ elements out of L V i a — I. Since each selected base can be changed to S — I other bases, 
the total number of possibilities for Zi is { L "\ a 1 ~ l ) (S — I)' 1 . For a given l 2 , we need to choose l 2 elements out of I. Since 

each selected base is restored to the corresponding base in a V i a ,o, the total number of possibilities for l 2 is (jj . Finally, 
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for a given ^3, we need to choose ^3 elements out of the remaining l — l<i- Since each selected base is changed, but is not 
changed back to the corresponding element in o via Q, there are S — 2 possibilities per changed base, hence the total 
number of possibilities for l 3 is ( 7 3 2 ) {S — 2)' 3 . Performing a similar analysis for the repair gene, and putting everything 
together, we obtain a total sequence degeneracy of ( Lv \ a ~ l ) (/ 2 ) Cij 2 ) ( Lr jr 1 ') (J,) {S-l) h+l ^ (S-2) l3+l 's . Putting 

everything together, we obtain, 

i 1= o ; 2 =o; 3 =o i[=o v 2 =ov 3 =o v / \ -«/ \ ^ / \ 1 / \ 2/ \ 3 / 

^ 1+( - i2 ( ! ^ ± 1 i ) il+ ' 2+ ' 3+ '' 1+ ^ +4 (l - er 1+ r-; 2 ) L " i<1+I '- ei> - il - i2 -' 3 - i ' 1 -^-'^/ 1+i - i2 ,r 1+ /'- i2 - (7) 
We may sum over I3 and l 3 to obtain, 
L via -l 1 L rep -l' 1' 



dxn' \ \ ^— \ ^— \ f L v i a l\(l\f L-rep ~ l\ ( I 

dt 



e r 1+ i'-i 2 V 1 _ e i' 1 +i'-i' 2 ) I 1 + (5 _ _ ei , +v _ v y 2 xi 1+ i-i 2 ,i' 1+ i'-i' 2 

-R{t)xw 



ii=0 i 2 =0 (i=0 ( 2 =0 v / \ \ 1 / \ 2/ 

ii+'i H ,„ ■,Ly ia +L r „-l-l'-l 1 -l\ ( e l'i+l'-l' 2 \l 2 +l' 2 (-i e l' l +l'-l' 2 \l+l'-l 2 -l' 

e i' 1 +i'-i 2 y L ~ e i[+i'-i' 2 ) U s _ x J H 1 — s_ 1 ^ 2 Xh+i-i 2 ,i[+i>-i 2 

-R{t)x w (8) 

Before proceeding, we introduce the following definitions: Define Cu> to be the number of sequences in HC{1, V). Note 
then that Cw = ( L "f a ) { L ]f p ) {S— . Define zw to be the fraction of the population in H C(l, I'). Then zw = Cu>xu>. 

Define zq to be the total fraction of viable organisms, so that z a — Yl,v=o z oi' ■ Then R(t) = (k — l)z + 1 . Reexpressing 
our dynamical equations in terms of the zu>, we obtain, after some manipulation, 

dzui v J-^ / L V i a — I — li + ?2\ fh+l — h\ ( L rep — V — l[ + l' 2 \ fl[ + I' — l' 2 \ 

^" ££,§£( <■ )\ h A n An h*- x 

J2+1' 2 /-, \L via +L rep -l-l' -h-l[ ( e l'i+ l '- l 2 \h+l\ (1 £ l' 1 +l'-l' 2 w + ;'_; 2 _;' 

e z' 1 +/'-z^ U _ £ ii+i'-i 2 i H ^ _ x j U 1 ^ - 1 j 2 Zl 1 +l-l 2 ,l> 1 +l'-l< 2 

-{{k - 1)Z + 1)Z W (9) 

The equilibrium solution is obtained by solving the equations obtained by setting the left-hand side to zero. The 
numerical solution of the equilibrium equations is discussed in Appendix A. 

B. Behavior in the Limit of Infinite Sequence Length 

We now let the viability and repair sequence lengths L V i a , L rep approach 00, while keeping a = L V i a /L rep , /1 = Le, 
and e r fixed, where L = L via + L rep is the total sequence length. Since e;' i+ ;<_^ = e or e r e, it is clear that = 
Lei' i+ ii _[> 2 remains fixed in the limit L — > 00. 

We claim that, for a given I, I', the only terms which survive the limiting process are the l\ = l[ = terms. We 
then note that, as L V i a , L rep —* 00, 



L v i a — l + h\ i 2 1 i T ,u 1 , a 
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and, 

(1 - e,,.,,)^'--' e -*""-'i (11) 
Taking similar limits for the L rep terms, we obtain the infinite sequence length equations, 

i 2 =ai' 2 =o 1 

Expanding out the terms, and redenoting l 2 by h, and V 2 by l[, we obtain, 

l[=0 1 

(-1 1 (-1 I'-l 1 

Ji=o ; 1= o/' 1= o 1 

-((fc-l)z + l)z«> (13) 

To understand why only the l\ = l[ = terms survive, let us consider the mutational contribution from those 
Zh+i-h.i^+i'-i'? f° r which at least one of h, l[ > 0. A a' G HC{l\+l — l 2 , l[ +1' — 1' 2 ) was obtained from a cr G HC(l, I') 
by changing ii of the L„j — ' bases in a v i a which were equal to the corresponding bases in <7 V i a ,o, arid similarly for 
l[ and a rep . Therefore, for a' via to mutate to a v { a , h of the changed bases must back mutate to the corresponding 
bases in a v i a fi- However, in the limit of infinite sequence length, the number of unchanged bases in <r' via , given by 
L V ia — h — I + h, becomes infinite, and so the probability of a mutation occurring at one of those bases approaches 
1, so that the probability of back mutation goes to 0. 

This heuristic argument is given a more rigorous justification in Appendix B. As a simple check, we also ensure 
that total population is conserved in the limiting process. 




IV. SOLUTION OF THE MODEL 
A. The Phase Diagram 

We begin our solution of the model by computing the equilibrium values of z and z o- We begin with the dynamical 
equations for Zoi>, 

We may sum from I' = — oo to obtain the dynamical equation for z - Together with the dynamical equation for z o, 
we have the pair of equations, 

^ = fc(e-*W - e-*^ 00 + (fce"*" - (k - l)z - l)z (15) 

= (ke~^ - (k - l)z - 1)300 (16) 

We may obtain the equilibrium solution of these equations by setting the left hand sides to zero. A summary of the 
possible solutions is given in Table I. 

We need to map out the regions in the (/i, e r ) parameter space for which the various solutions are valid. First of 
all, note that we must have zq G [0, 1] and zoo G [0, zq\. For the first solution set to hold, we must therefore have 
< ke~ Cr ^ — 1 < k — 1. The second inequality is automatically satisfied. For the first inequality to hold, we must 
have e r fi < Ink. In order for z 0Q G [0, z ], we must then have < (e~ £r ^ - e^^rr^) / \e~^ trtl - e~^ M ) < 1. Again, 
the second inequality is automatically satisfied, but the first only holds when e r < ^rj- Therefore, the first solution 
pair is only valid when e r fi < Ink, and e r < However, the other two solution pairs may still yield physical values 
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TABLE I: The possible equilibrium values of (zo, zoo) as a function of n and e r . 




3.454 



FIG. 1: Diagram illustrating the solution domains fii (black), fi2 (grey), and Q3 (white). The \x axis is labelled only at 
\nk/e r ,crit ~ 3.454, while the e r axis is labelled at 0, e r ,crit = 2/3, 1. 



for (zo, zoo) in the domain of validity of the first solution pair. To resolve this issue, we note that we want a solution 
which gives z o — > 1 as e r — > 0. That is, if repair is perfect, then at equilibrium the population should only consist 
of viable repairers. Therefore, as e r — ► 0, we expect the first solution pair to hold, since it gives the correct limiting 
behavior. By continuity, the first solution pair holds over the set fix = {(fi, e r ) £ [0, oo) x [0, l]|e r /x < In k, e r < -^j}- 
As e r is increased beyond ^r^, the first solution is no longer valid, but the second solution may still be valid if 

< fce^^rr^ — 1 < k — 1. Again, the second inequality is automatically satisfied, while the first only holds when 
T^rrM ^ The third solution pair may still be physical in the domain of validity of the second solution pair. To 
resolve this issue, we may note that we want a solution which gives zq — > 1 as fi — > 0. That is, in the limit of no 
replication errors, all of the population is viable. Therefore, as \i — > with e r > -^j, we expect the second solution 
pair to hold, since it gives the correct limiting behavior. By continuity, the second solution pair holds over the set 
^2 = {(Mi £ r) £ [0> oo ) x [0, < In k, e r > ^j}. The third solution is then the solution over the domain 

n 3 = ([o,oo)x[o,i])/(niun2) a 
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FIG. 2: Plot of A for a = 2, k = 10. 



Figure 1 illustrates the three solution domains Qi, ^3 for a = 2, fc = 10. In S7i , the population is clustered within 
finite Hamming distances about the viable and repair "master" sequences. A finite positive fraction of the population 
is viable, and of the viable organisms, a finite positive fraction of the population is capable of repair. As e r is increased 
beyond j^rji the population becomes delocalized over the repair gene subspace, and the fraction of repairers becomes 
zero. This phenomenon is known as the repair catastrophe, and was first predicted in Nevertheless, if fx is still 
sufficiently small so that -^1^ — ^ n ^' then the population is still localized about the viable "master" sequence, and 
the fraction of viable organisms is positive. In this regime, as fi is increased so that -^jLi > hik, the population 
delocalizes again over the viable gene subspace, and the fraction of viable organisms drops to zero. This phenomenon 
is known as the error catastrophe. 

It is not necessarily true that the repair catastrophe is encountered before the error catastrophe. Defining e r ,crit = 
^7-]-, then, whenever e r < e r ,crit, the first solution set becomes unphysical when fx > lnfc/e r . However, the second 
solution set is also unphysical, since then e r ^ cr i t Li > e r /i > In fc, so that the third solution set is the valid one. Thus, 
the population goes through the error catastrophe without going through the repair catastrophe. 

The direct transition through the error catastrophe can also occur when e r is varied at fixed \x. When fx > In fc/e^cHt, 
then, as e r is increased from to 1, fce _£r/i — 1 drops below zero before e r = e r , C rit- Therefore, the first solution set 
becomes unphysical before the repair catastrophe occurs, but the second solution set is also unphysical, meaning the 
third solution set is the one that is valid. Thus, for all \x > In k/e r , C rit, as e r is increased from to 1, the population 
undergoes the error catastrophe at e r — In k/ix < e r ,crit , so that the repair catastrophe is never observed. 

We may use our three solution pairs to compute A = R for the three solution domains, or "phases." We have 
A = (fc — l)z + 1, so that, 



Figure 2 shows a plot of A versus (/i, e r ) for a = 2, fc = 10. Figure 3 shows the corresponding plot for z 00 . Note 
that A is continuous, but not dX/d/i and d\/de r . The error and repair catastrophes are therefore second-order phase 
transitions. 

The error and repair catastrophes both arise as a result of the interplay between two competing effects: (1) The 
selective advantage for being viable and for being a repairer, and (2) The entropic tendency to be unviable and a 




(17) 
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FIG. 3: Plot of zoo for a = 2, k = 10. 



mutator. For a sufficiently low mutation rate, the selective advantage for being viable is strong enough to localize 
the population about <7„i a ,o- However, when the mutation rate exceeds a critical value, the selective advantage for 
being viable is no longer sufficiently strong to localize the population about the viable "master" sequence, and the 
population delocalizes over the entire viability subspace. Below the repair catastrophe, this occurs when the effective 
growth rate of the viable, repairing sequence <J V ia,o&rep,o becomes comparable to the growth rates of the nonviable 
sequences, i.e., when ke~ er ^ 1 — 1. Above the repair catastrophe, there is no longer any preference for being a repairer. 
The effective growth rate of the viable sequences due to mutation off of the viability peak is fee "+ 1 , hence, above 
the repair catastrophe, the error catastrophe occurs when ke "+ 1 = 1. 

Below the error catastrophe, viable repairers have a slower rate of mutation off of the viability peak than viable 
mutators, and hence have a higher effective growth rate. For sufficiently efficient repair, this discrepancy causes 
localization about cr rep .o- However, when the repair error probability exceeds t r ,crit = ^rj = L V i a /L, the selective 
advantage for being a repairer is no longer sufficient to localize the population, and the distribution undergoes the 
repair catastrophe, in which the distribution delocalizes over the repair subspace. Note that e r ^ cr n is simply the 
fraction of deleterious mutations, and increases with increasing a. This makes sense, since, the greater the fraction 
of deleterious mutations, the greater the relative advantage for being a repairer. Thus, for large a, repair has to be 
highly inefficient before the repair catastrophe occurs. Conversely, for low a, repair has to be highly efficient to give 
the repairers a sufficiently large advantage for the distribution to localize about the repair "master" sequence. 

It should be noted that the error and repair catastrophes are similar to thermodynamic phase transitions, in that 
they both arise from a competition between maximum fitness (minimum energy) and maximal entropy. When the 
replication and repair error probabilities arc suficiently low (low temperature, high pressure, say), maximal fitness 
(minimum energy) wins out, leading to localization on the sequence space. When the replication or repair error 
probabilities are sufficiently high (high temperature, low pressure), maximal entropy wins out, leading to delocalization 
on the sequence space. While not exact, this analogy nevertheless conceptually describes the origin of the phases 
observed in this study. 
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B. A Recursive Formula for the Population Distribution 

Given zo,zoo, the equilibrium equations may be solved recursively to obtain any zw for a given (/i, e r ) pair. For 
I' > 0, we have, 



dt 

so at equilbrium we have 



( x =i 



(18) 



Z0i ' = (fc. ,.-, '-(-^'' ' ""--".» + - " > , -l-^'' 1 :<..,' I (1!)) 

We next turn our attention to zi for I > 0. We have, 

w = ^ { ^ri )le ^ Zm + e ~ e " £ h { ?fi )llzi - hfi + e ^ zm {{k 1)zo + l)zm m 

h = l 

so at equilibrium we have, 

Zl0= (k-l)z + l-e-^n(—^ e " Zm + e "2.^(^1)^.0) ( 21 ) 
Finally, we compute the equilibrium value of recursively for i, i' > 0. The result is, 

(1 — ti=l 

-1 ('-1 



+ yy J_ a 'i(_^L_ 



) ll *l-h,l'-i0) ( 22 ) 



C. Localization Lengths 

The final set of quantities we wish to compute are the following localization lengths of the equilibrium distribution: 

oo 

(l') ma ee J2 l ' z °i' (23) 

i' = l 

oo 

(Z)rep = ^/-ZJO (24) 

(0 = EE ! * (25) 

(=1 Z'=0 

(0 = EE''* (26) 



= 1 1=0 



Using the dynamical equations for the zw we may compute the various localization lengths at equilibrium. The basic 
idea is to obtain an expression for the time derivatives of the localization lengths in terms of the localization lengths 
themselves, and then solving for the equilibrium value. We illustrate the technique for (l') v i a . We have, 
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E 1 ' 



dt f—> dt 



Q, _|_ I £ ■ ^ * 



E E ^(^^ - ((* - 1)* + ixo. 

oo 1 

k Jr^ e -^^ Zoo + ke - M v (-V—yi((i') via + i[( Z0 _ zoo)) - ((* - i)«d + i)<0i 

a + 1 ^ — ' L\\ a + 1 



= fc-^- e -^<%o + Ae-*"«Z'>„ ia + -4t(*o - z oo)) - ((A - l)*b + l)<i'U (27) 
a + 1 a + 1 

so at equilibrium we obtain, 

[ ' ma a+1 (fc-l)z + l-fce-^ 1 1 

To compute the remaining localization lengths using the above approach, we first need to compute z' Q = X^o^O' 
Note that z' is simply the total fraction of repairers. We compute z' by evaluating dz' /dt — YmZo dz^/dt. The result 
is an expression in terms of z' , Zq, and zqq, which may be solved at equilibrium to obtain, 

, (k - l)e"^ 

4 = ~T. TT ~ 3E 2 «0 (29) 

(k - 1)zq + 1 - e c.+i 

We then obtain, 

{l)rep = EW e - m (fc-l)*ao + *& (30) 
"+1 (fc - l)z + 1 - e~^+r 

((/), (0) - -(0, (/')«„) + TTT-Tx (( fc ~ + !-(!- *0*o - (* - 1)(1 - *r)*oo)(", 1) (31) 
zq (k-l)(a+l)z 



D. Limiting Forms of the Distribution 

It is instructive to study the behavior of the distribution in the following limiting cases: (1) fj, — ► 0. (2) a — * oo. 
(3) a — > 0. We handle each of these cases in turn. 



1. Behavior in the Limit (i—tQ 

For ji — > 0, note that zo — ► 1, and zoo — ► (^^j — £r) / (^jxt(1 ~~ e r)) = 1 — ^r/ipc{\ — e r )), below the repair catastrophe. 
We may also note that z' Q — > zoo, and ((I), (l 1 )) — * (0, (l') V i a ). This makes sense since in the limit /i — * 0, we expect 
that the entire population becomes viable. For the same reason, (l) re p — > as /i — > 0. Finally, as [i — > for e r < e r ,crit, 

(I )via ► (Q,+ i)(_s__ Cr ) (1 — (1 — e r)(l — Q (l-e,.) )) = (a + l)(e r crit-f-r) €r ~a~ ~ (1 ~ e r,crit) / £r,crit x ^r/(^r,crit ~ e r)- As 

expected, these results agree with the point-mutation limit expressions obtained in 



2. Behavior in the Limit a — » oo 



For a — > oo we obtain e r .crit = 1- Hence, we are always below the repair catastrophe. As long as /i < lnfc/e r , 
then z o — » z = (fce~ ErA1 — l)/(fc — 1). Thus, the solution pairs presented in Table I reduce to two possible solutions. 
Either zo = zoo = (fce _CrM — l)/(fc — 1) if we are below the error catastrophe (e r /i < lnfc), or zo = zoo = if we are 
above the error catastrophe. This means that the fraction of mutators is always zero. To understand this behavior, 
note that the probability of mutating off of the repairer sequence is 1 — e "+ 1 , while the probability of mutating off 
of a mutator sequence is 1 — e "+ 1 . Both go to as a — > oo. However, since for e r < 1 the repairer sequence has a 
greater selective advantage than the mutator sequence, the repair strain comes to dominate the population. Only at 
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e r = 1 is there an ambiguity, since (e _€ '' M — e~°+ T )/(e~"°+ r — e~°+r) — > 0/0, which is undefined. Physically, since 
at e r = 1 there is no difference between what we call a "repairer" and a "mutator," we expect derealization over the 
repair subspace, so that zqq — > 0. 

We may also note that Zq — > zoo/^o = 1 for £r < 1) an d for e r = 1. We also have {l) re p — > ke r fie~ erfl / (fce _£rM — 1). 
Also, (l') V ia — > 0, for e r < 1, and oo for e r = 1. 

3. Behavior in the Limit a — > 

For a — > 0, we have e r ,crit - > 0. Therefore, for all e r > we are beyond the repair catastrophe, and since 
^rj/-t = < lnfc, we are below the error catastrophe as well, so that zq — 1 with zoo = 0. This makes sense, since, 

for a = 0, the probability of mutating off of the viability peak is 1 — e^^ 1 ^ — > 0. Thus, the entire population 
is viable at equilibrium. As for zoo, we note that zqq = for e r > 0, but for e r = we obtain the expression, 
(e° — e°)/(e° — e°) x zo = 0/0. Physically, we must have zoo = 1 at e r = 0. This ambiguity is therefore resolved by 

letting a — » for e r = 0. That is, we evaluate zoo for a = 0, e r = by setting Zoo = lim a ^o £oo 

I a=0,e r =0 I a,e r =0 

As expected, for e r > we have z' = 0, (r)„ ia = oo, (l) re p = 0, (Z) = 0, and (/') = oo. Again, for e r — we resolve 
any ambiguities by letting a — ► 0, giving, as expected, Zq = 1, (l'} V i a — (l)rep = (0 = (0 = 0. 

V. CONCLUSIONS AND FUTURE RESEARCH 

This paper presented a two gene, single fitness peak model to determine the equilibrium distribution of genotypes 
in a unicellular population capable of replication error repair. The work presented here was a continuation of in 
which the equilibrium distribution of mutators was studied for mutation rates well below the error catastrophe. This 
paper obtained the equilibrium behavior of the two gene model for arbitrary mutation rates, thereby incorporating 
both the error and repair catastrophes into a single, two-dimensional phase diagram. While our model is probably 
the simplest one could use for studying evolutionary dynamics in the presence of genetic repair, it does nevertheless 
make experimentally testable predictions. As mentioned in the Introduction, the error catastrophe has already been 
observed |l7lll8(. The repair catastrophe would be more difficult to observe experimentally, since it would be necessary 
to selectively interfere with the DNA mismatch repair system. If possible, however, it would be interesting to try to 
experimentally map out the phase diagram shown in Figure 1 for an actual organism, such as E. coli. 

In it was noted that the equilibrium distribution of mutators did not depend on fi, but only on e r and a. This 
was interesting since the larger the value of [i, the greater the difference in mutation rates off of the viability peak 
between repairers and mutators. One might also naively expect the repair catastrophe to disappear entirely as n — ► 0, 
since the difference in viability between repairers and mutators disappears in the limit of no mutations. In [j| it 
was assumed that mutations were sufficiently slow so that only point mutations needed to be considered. In the 
complete model, when we allow for mutations between any two genomes, we do indeed obtain a /z-dependence on 
the equilibrium distribution of mutators. Interestingly, however, the repair catastrophe still occurs at t r ^ C rit — -^i, 
unchanged from the point-mutation result in Q . 

We believe that the solution technique developed in this paper may be used to solve a large class of mutation 
dynamics equations. To illustrate, consider a more general genome consisting of N genes, so that the full sequence 
cr may written as a = ci . . . <r/v. We assume that there exist "master" sequences <ti,o, ■ • ■ i cjv.Oj such that the 
properties of each a n depends only on the Hamming distance HD(<r n , a n fi)- We then can define the Hamming class 
HC(l%, . . . , In) = {c = <7i . . . <JN\HD(a n , cT n ,a) — In, n — 1, . . . , N}. Consider then some o'i lt ,„ t i N € HC(h, . . . , In). 
For each n £ {1, . . . , N}, define l n i,l n 2,l n 3 analogously to lx,l2,h and ix>^2>^3 fr° m Section III. A. Then the vector 
(hi, I12, ha, ■ ■ ■ , Ini, In2, lm) defines a set of base changes to a new genome o-i 11 +i 1 -i 12 ,...,i N i+In-In2 e HC(l n + 
h ~ h2, ■ • ■ i ^jvi + In — In2), which is a Hamming distance of In + hi + ^13 + • • • + Ini + ZjV2 + hv3 from cr^ i N ■ 
Proceeding as in Section III. A, we may define z;^...^ to be the fraction of the population in HC(h, . . . ,In), and 
obtain an expression for dzi lt „ i N / dt which is a generalization of the expression given in Eq. (9). Presumably, the back 
mutation terms should drop out in the limit of infinite sequence length, giving an infinite sequence length equation 
similar to Eq. (12). 

For future research, we would like to move away from studies of equilibria and focus on the role that mutators play 
in dynamic environments. Incorporating the effects of horizontal transfer between organisms will also be useful for 
exploring phcnomcnological aspects of antibiotic drug resistance. We also seek to develop more realistic replication 
models, incorporating the double-stranded nature of the DNA molecule. In our current model, we essentially "black- 
boxed" the replication dynamics, and assumed that the double-stranded DNA could be represented as a single symbol 
sequence. While the complementary nature of the double helix makes this assumption technically correct, the actual 
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replication dynamics with two complementary strands is somewhat different than the single-strand model used in this 
paper. 

Finally, we plan to develop collaborations with experimental groups working on mismatch repair, and attempt to 
devise possible strategies for tuning the efficiency of the mismatch repair system. If successful, such experiments would 
give direct confirmation of the repair catastrophe, and provide a better understanding of error correction mechanisms 
in biological systems. 
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APPENDIX A: NUMERICAL SOLUTION OF THE MODEL FOR FINITE GENOMES 

Equation (9) in Section III. A gives the expression for the Hamming class symmetrized dynamics equations of our 
model. We can put this equation into matrix form, 

^=Bz-{k-z)z (Al) 

where z = (zw) is the vector of population fractions in the various Hamming classes, B is the matrix of first-order 
mutation rate constants between the various Hamming classes, and k is the vector of first-order growth rate constants 
for the various Hamming classes, so that k ■ z = X^'=o K ll' z ii', where kw is simply ki in our model. 

The equilibrium distribution may then be solved using fixed-point iteration, via the equation, 

z n +i = -J—rBz* n (A2) 

K • Z n 

In principle, the iterations are terminated when the z n stop changing. We introduce a fractional cutoff parameter 
S, and stop iterating when (z n+ N e ,w — z n ji>) / z ni w < S. N € is chosen to be sufficiently large so that on the order of 
one mutation is allowed to occur after N e iterations, to ensure that equilibration is being accurately measured. For 
a large sequence length L, the probability of correct replication is e~ Le , so the probability of incorrect replication is 
1 - e~ Le . Therefore, taking N e = 1/(1 - e- Le ) ensures that on the order of one incorrect replication has occurred, so 
that if (z n+ N t ,w — z n ,ll') I z n ,W < $ for all I, I', then it is possible to assume that equilibration has been achieved. 

Note that what this method does is account for the fact that equilibration takes longer for smaller values of e, i.e., 
for slower mutation rates. Since lim^o N e = oo, and lim e ^i N e « 1 for large L, we see that our choice of N e accounts 
for the slower equilibration rate by iterating more times before comparing the changes in the Ziy . In our numerical 
simulations, we found that 6 = 10~ 4 — 10~ 3 was sufficient to achieve good convergence. For a = 2, k = 10, it was 
found that for L — 30 the equilibrium values of z and z o were almost identical to their L — oo values. For this 
reason, we did not give figures showing the results of numerical simulations in this paper. 

APPENDIX B: JUSTIFICATION OF THE INFINITE SEQUENCE LENGTH FORM OF THE 

DYNAMICAL EQUATIONS 

To establish the infinite sequence length form of Eq. (9) in Section III. A, we need to first establish some basic 
inequalities, to facilitate the computation of upper bounds. We begin with the following inequality, for > 0: 

1 ' 1 fe=i 1 fe=i 

Note that this inequality also holds for l\ — 0. A similar inequality holds for the primed indices. Our next inequality 
is simply, 

Lvia - 1 - h + h Vi 2 n \L V i a -l-h <- -, (TZ9\ 

e v 1+ i> -v 2 y l - t^+v -i' 2 ) ( B2 > 
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and similarly for the primed indices. Finally, we may note that zw < 1 for all I, V . Now, to simplify the calculation, 
denote the summand in Eq. (9) of Section III. A by Su^iy y . Then putting together our inequalities, we obtain, 

l l' L via -l l L r<lp -l' i' 

Y Y S W0l> 2 < E Y Y Y S UM'V^ 

l 2 =0l' 2 =0 2i=0 h=0 l' 1= l' 2 =Q 

I I' L via -l I l' , 

< Y Y w« s +YYY h^Y 1 

l 2 =0l' 2 =0 h=l l 2 =0l' 2 =0 

Lrep-l' I' I L via -l I L rep -l' I' 

+ Y EE^ e)i;+ EEE Em^/^^ 1 

c 1= i v 2 =oi 2 =v h=i i 2 =a i[=i i' 2 =o 

ii' i _ ( l+i c )L„in-l 

= EE Wo/i+^ + i) 2 a'+i)^T — t^jtt — 

i 2 =0Z'=0 1 S-l £ 



+k{l + !)(/' + !) 



2 



e 



5-i i-m e 



+k (i + if {i' + 1) 2 (^-) 2 — M^m — ( B3 ) 

1 5 _ 1 e 1 s _ 1 e 

Now, following the argument from the beginning of Section III.B, we have that, as L V i ai L rep — > oo at fixed a,/j,,e r , 
we get that, 

E E w«i -EE ^« Z2 (?TT)' 2+ " e " ! '^^^-^ ( B4 ) 

hence, since e — > at fixed \x when L rep — ► oo, we see from the inequalities given in Eq. (B3) that, 

L via -l I L rsp -l' i' I I' 

YYY E^m^^EE^^S^ 6 "''^^^^ (B5) 

i 1= l 2 =0 l[=0 l' 2 =0 l 2 =0l' 2 =0 2 ' 

The convergence is not uniform, however, since our upper bound depends on I, I'. 

This establishes the infinite sequence length form of our dynamical equations, as given in Eq. (12) of Section III.B. 
We may verify that total probability is conserved in our limiting process. Defining z — ; , zu>, we obtain, 



dz 
~dt 



OO OO 



I I' 



EE E E wt^^rf ) ,1+, ' le """- , -'-' 1 ."-" 1 

J=0J'=0!i=0J' 1= 1 1- + 
oo oo oo oo 

E E E E wr^(fri ) h+l[ ^'-^<-^ - «(*) 

oo oo oo oo 1 

E E « fe e-^^.M E E ^^(iri)^' 1 -*(*)* 



oo oo 



= E E ~ 

fci=0 fci=o 

= «(t)z - R(t)z = (B6) 

Thus, since z starts off at 1, it remains 1 at all times, hence total probability is conserved in the infinite sequence 
limit. 
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